AITopics | Dutchess County

Collaborating Authors

Dutchess County

Efficient Uncertainty Estimation for LLM-based Entity Linking in Tabular Data

Bono, Carlo, Belotti, Federico, Palmonari, Matteo

arXiv.org Machine LearningOct-3-2025

Linking textual values in tabular data to their corresponding entities in a Knowledge Base is a core task across a variety of data integration and enrichment applications. Although Large Language Models (LLMs) have shown State-of-The-Art performance in Entity Linking (EL) tasks, their deployment in real-world scenarios requires not only accurate predictions but also reliable uncertainty estimates, which require resource-demanding multi-shot inference, posing serious limits to their actual applicability. As a more efficient alternative, we investigate a self-supervised approach for estimating uncertainty from single-shot LLM outputs using token-level features, reducing the need for multiple generations. Evaluation is performed on an EL task on tabular data across multiple LLMs, showing that the resulting uncertainty estimates are highly effective in detecting low-accuracy outputs. This is achieved at a fraction of the computational cost, ultimately supporting a cost-effective integration of uncertainty measures into LLM-based EL workflows. The method offers a practical way to incorporate uncertainty estimation into EL workflows with limited computational overhead.

accuracy, serie, time complexity, (15 more...)

arXiv.org Machine Learning

2510.01251

Country:

Europe > Ireland (0.14)
Europe > Spain > Galicia > Madrid (0.06)
North America > United States > South Dakota (0.05)
(17 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.93)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Lecture I: Governing the Algorithmic City

Lazar, Seth

arXiv.org Artificial IntelligenceOct-17-2024

A century ago, John Dewey observed that '[s]team and electricity have done more to alter the conditions under which men associate together than all the agencies which affected human relationships before our time'. In the last few decades, computing technologies have had a similar effect. Political philosophy's central task is to help us decide how to live together, by analysing our social relations, diagnosing their failings, and articulating ideals to guide their revision. But these profound social changes have left scarcely a dent in the model of social relations that (analytical) political philosophers assume. This essay aims to reverse that trend. It first builds a model of our novel social relations as they are now, and as they are likely to evolved, and then explores how those differences affect our theories of how to live together. I introduce the 'Algorithmic City', the network of algorithmically-mediated social relations, then characterise the intermediary power by which it is governed. I show how algorithmic governance raises new challenges for political philosophy concerning the justification of authority, the foundations of procedural legitimacy, and the possibility of justificatory neutrality.

governance, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.2072

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
(15 more...)

Genre: Research Report (0.64)

Industry:

Media (1.00)
Law (1.00)
Information Technology > Services (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Lecture II: Communicative Justice and the Distribution of Attention

Lazar, Seth

arXiv.org Artificial IntelligenceOct-17-2024

Algorithmic intermediaries govern the digital public sphere through their architectures, amplification algorithms, and moderation practices. In doing so, they shape public communication and distribute attention in ways that were previously infeasible with such subtlety, speed and scale. From misinformation and affective polarisation to hate speech and radicalisation, the many pathologies of the digital public sphere attest that they could do so better. But what ideals should they aim at? Political philosophy should be able to help, but existing theories typically assume that a healthy public sphere will spontaneously emerge if only we get the boundaries of free expression right. They offer little guidance on how to intentionally constitute the digital public sphere. In addition to these theories focused on expression, we need a further theory of communicative justice, targeted specifically at the algorithmic intermediaries that shape communication and distribute attention. This lecture argues that political philosophy urgently owes an account of how to govern communication in the digital public sphere, and introduces and defends a democratic egalitarian theory of communicative justice.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.20718

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Information Technology > Services (1.00)
Law > Civil Rights & Constitutional Law (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
(3 more...)

Add feedback

Zero Inflation as a Missing Data Problem: a Proxy-based Approach

Phung, Trung, Lee, Jaron J. R., Oladapo-Shittu, Opeyemi, Klein, Eili Y., Gurses, Ayse Pinar, Hannum, Susan M., Weems, Kimberly, Marsteller, Jill A., Cosgrove, Sara E., Keller, Sara C., Shpitser, Ilya

arXiv.org Artificial IntelligenceJul-2-2024

A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e.g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros, or aim to replace excess zeros by imputed values. If the goal of the analysis relies on knowing true data realizations, a particular challenge with zero-inflated data is identifiability, since it is difficult to correctly determine which observed zeros are real and which are inflated. This paper views zero-inflated data as a general type of missing data problem, where the observability indicator for a potentially censored variable is itself unobserved whenever a zero is recorded. We show that, without additional assumptions, target parameters involving a zero-inflated variable are not identified. However, if a proxy of the missingness indicator is observed, a modification of the effect restoration approach of Kuroki and Pearl allows identification and estimation, given the proxy-indicator relationship is known. If this relationship is unknown, our approach yields a partial identification strategy for sensitivity analysis. Specifically, we show that only certain proxy-indicator relationships are compatible with the observed data distribution. We give an analytic bound for this relationship in cases with a categorical outcome, which is sharp in certain models. For more complex cases, sharp numerical bounds may be computed using methods in Duarte et al.[2023]. We illustrate our method via simulation studies and a data application on central line-associated bloodstream infections (CLABSIs).

assumption, constraint, identification, (14 more...)

arXiv.org Artificial Intelligence

2406.00549

Country:

North America > United States > Maryland > Baltimore (0.05)
North America > United States > New York > Dutchess County > Poughkeepsie (0.04)
North America > United States > District of Columbia (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality (0.63)

Add feedback

Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-hallucination

Shi, Luyao, Kazda, Michael, Sears, Bradley, Shropshire, Nick, Puri, Ruchir

arXiv.org Artificial IntelligenceJun-3-2024

Electronic design engineers are challenged to find relevant information efficiently for a myriad of tasks within design construction, verification and technology development. Large language models (LLM) have the potential to help improve productivity by serving as conversational agents that effectively function as subject-matter experts. In this paper we demonstrate Ask-EDA, a chat agent designed to serve as a 24x7 expert available to provide guidance to design engineers. Ask-EDA leverages LLM, hybrid retrieval augmented generation (RAG) and abbreviation de-hallucination (ADH) techniques to deliver more relevant and accurate responses. We curated three evaluation datasets, namely q2a-100, cmds-100 and abbr-100. Each dataset is tailored to assess a distinct aspect: general design question answering, design command handling and abbreviation resolution. We demonstrated that hybrid RAG offers over a 40% improvement in Recall on the q2a-100 dataset and over a 60% improvement on the cmds-100 dataset compared to not using RAG, while ADH yields over a 70% enhancement in Recall on the abbr-100 dataset. The evaluation results show that Ask-EDA can effectively respond to design-related inquiries.

dataset, llm, retrieval, (16 more...)

arXiv.org Artificial Intelligence

2406.06575

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > United Kingdom > England > Shropshire (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework

Elgarhy, Islam

arXiv.org Artificial IntelligenceNov-24-2023

Support Vector Machine (SVM) algorithm requires a high computational cost (both in memory and time) to solve a complex quadratic programming (QP) optimization problem during the training process. Consequently, SVM necessitates high computing hardware capabilities. The central processing unit (CPU) clock frequency cannot be increased due to physical limitations in the miniaturization process. However, the potential of parallel multi-architecture, available in both multi-core CPUs and highly scalable GPUs, emerges as a promising solution to enhance algorithm performance. Therefore, there is an opportunity to reduce the high computational time required by SVM for solving the QP optimization problem. This paper presents a comparative study that implements the SVM algorithm on different parallel architecture frameworks. The experimental results show that SVM MPI-CUDA implementation achieves a speedup over SVM TensorFlow implementation on different datasets. Moreover, SVM TensorFlow implementation provides a cross-platform solution that can be migrated to alternative hardware components, which will reduces the development time.

algorithm, dataset, implementation, (13 more...)

arXiv.org Artificial Intelligence

2311.14908

Country:

North America > United States > Wisconsin (0.05)
North America > United States > New York > Dutchess County > Poughkeepsie (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

GUMSum: Multi-Genre Data and Evaluation for English Abstractive Summarization

Liu, Yang Janet, Zeldes, Amir

arXiv.org Artificial IntelligenceJun-19-2023

Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to 'hallucinations', low performance on non-news genres, and outputs which are not exactly summaries. Targeting ACL 2023's 'Reality Check' theme, we present GUMSum, a small but carefully crafted dataset of English summaries in 12 written and spoken genres for evaluation of abstractive summarization. Summaries are highly constrained, focusing on substitutive potential, factuality, and faithfulness. We present guidelines and evaluate human agreement as well as subjective judgments on recent system outputs, comparing general-domain untuned approaches, a fine-tuned one, and a prompt-based approach, to human performance. Results show that while GPT3 achieves impressive scores, it still underperforms humans, with varying quality across genres. Human judgments reveal different types of errors in supervised, prompted, and human-generated summaries, shedding light on the challenges of producing a good summary.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.11256

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(10 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media > News (0.47)
Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

What Are The Future Disruptive Trends In A Volatile 2023

#artificialintelligenceDec-30-2022, 18:25:56 GMT

Businessman draws increase arrow graph corporate future growth year 2022 to 2023. The year 2023 is set to be revolutionary for technology, with many disruptive trends expected to reshape how businesses function and how people interact with each other. From metaverse-based virtual workspaces, advancements in quantum computing and green energy sources to innovations in robots and satellite connectivity – here's a look at the technological trends that could define the coming year. According to BCG's "Mind the Tech Gap" survey, a majority of businesses across 13 countries plan to increase their spending on digital transformation in 2023 vs. 2022. The top two areas for future investments are business model transformation and sustainability, with respondents expressing concern over the uncertain return on investment from digital transformation initiatives.

investment, quantum computing, robot, (14 more...)

#artificialintelligence

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.15)
Asia > Japan (0.06)
North America > United States > New York > Dutchess County > Poughkeepsie (0.05)
(5 more...)

Industry:

Information Technology (1.00)
Energy > Renewable (1.00)
Government > Regional Government > North America Government > United States Government (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

Add feedback

Executive Managed Seminal Computer System at IBM

WSJ.com: WSJD - TechnologyNov-25-2022, 16:14:00 GMT

A personal, guided tour to the best scoops and stories every day in The Wall Street Journal. Dr. Frederick P. Brooks Jr. liked building things, first laying foundations for modern computer systems at International Business Machines Corp. and later at the University of North Carolina, where he started the computer-science department. Dr. Brooks managed the development of IBM's System/360 family of compatible mainframe computers and then the software system that went with them during the 1960s. The computers became some of IBM's most popular models of the era, offering customers a choice of big or small computers with different processing speeds that could be used for both business and scientific tasks. The system was easy to expand since all the hardware ran off the same software, a departure from other systems that required software reprogramming when computers were added.

brook, computer, ibm, (10 more...)

WSJ.com: WSJD - Technology

Country:

North America > United States > North Carolina > Pitt County > Greenville (0.05)
North America > United States > North Carolina > Orange County > Chapel Hill (0.05)
North America > United States > North Carolina > Durham County > Durham (0.05)
North America > United States > New York > Dutchess County > Poughkeepsie (0.05)

Genre: Personal (0.49)

Industry:

Information Technology (1.00)
Education > Educational Setting > Higher Education (0.72)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Applying Association Rules Mining to Investigate Pedestrian Fatal and Injury Crash Patterns Under Different Lighting Conditions

Hossain, Ahmed, Sun, Xiaoduan, Thapa, Raju, Codjoe, Julius

arXiv.org Artificial IntelligenceNov-6-2022

The pattern of pedestrian crashes varies greatly depending on lighting circumstances, emphasizing the need of examining pedestrian crashes in various lighting conditions. Using Louisiana pedestrian fatal and injury crash data (2010-2019), this study applied Association Rules Mining (ARM) to identify the hidden pattern of crash risk factors according to three different lighting conditions (daylight, dark-with-streetlight, and dark-no-streetlight). Based on the generated rules, the results show that daylight pedestrian crashes are associated with children (less than 15 years), senior pedestrians (greater than 64 years), older drivers (>64 years), and other driving behaviors such as failure to yield, inattentive/distracted, illness/fatigue/asleep. Additionally, young drivers (15-24 years) are involved in severe pedestrian crashes in daylight conditions. This study also found pedestrian alcohol/drug involvement as the most frequent item in the dark-with-streetlight condition. This crash type is particularly associated with pedestrian action (crossing intersection/midblock), driver age (55-64 years), speed limit (30-35 mph), and specific area type (business with mixed residential area). Fatal pedestrian crashes are found to be associated with roadways with high-speed limits (>50 mph) during the dark without streetlight condition. Some other risk factors linked with high-speed limit related crashes are pedestrians walking with/against the traffic, presence of pedestrian dark clothing, pedestrian alcohol/drug involvement. The research findings are expected to provide an improved understanding of the underlying relationships between pedestrian crash risk factors and specific lighting conditions. Highway safety experts can utilize these findings to conduct a decision-making process for selecting effective countermeasures to reduce pedestrian crashes strategically.

artificial intelligence, machine learning, pedestrian crash, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1177/03611981221076120

2211.03187

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > California (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback